Law of large numbers

An illustration of the Law of Large Numbers using die rolls. As the number of die rolls increases, the average of the values of all the rolls approaches 3.5.

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.

For example, a single roll of a six-sided die produces one of the numbers 1, 2, 3, 4, 5, 6, each with equal probability. Therefore, the expected value of a single die roll is

 \tfrac{1+2+3+4+5+6}{6} = 3.5

According to the law of large numbers, if a large number of dice are rolled, the average of their values (sometimes called the sample mean) is likely to be close to 3.5, with the accuracy increasing as more dice are rolled.

Similarly, when a fair coin is flipped once, the expected value of the number of heads is equal to one half. Therefore, according to the law of large numbers, the proportion of heads in a large number of coin flips should be roughly one half. In particular, the proportion of heads after n flips will almost surely converge to one half as n approaches infinity.

Though the proportion of heads (and tails) approaches half, almost surely the absolute (nominal) difference in the number of heads and tails will become large as the number of flips becomes large. That is, the probability that the absolute difference is a small number approaches zero as number of flips becomes large. Also, almost surely the ratio of the absolute difference to number of flips will approach zero. Intuitively, expected absolute difference grows, but at a slower rate than the number of flips, as the number of flips grows.

The LLN is important because it "guarantees" stable long-term results for random events. For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. It is important to remember that the LLN only applies (as the name indicates) when a large number of observations are considered. There is no principle that a small number of observations will converge to the expected value or that a streak of one value will immediately be "balanced" by the others. See the Gambler's fallacy.

Contents

History

Diffusion is an example of the law of large numbers, applied to chemistry. Initially, there are solute molecules on the left side of a barrier (purple line) and none on the right. The barrier is removed, and the solute diffuses to fill the whole container. Top: With a single molecule, the motion appears to be quite random. Middle: With more molecules, there is clearly a trend where the solute fills the container more and more uniformly, but there are also random fluctuations. Bottom: With an enormous number of solute molecules (too many to see), the randomness is essentially gone: The solute appears to move smoothly and systematically from high-concentration areas to low-concentration areas. In realistic situations, chemists can describe diffusion as a deterministic macroscopic phenomenon (see Fick's laws), despite its underlying random nature.

The Italian mathematician Gerolamo Cardano (1501–1576) stated without proof that the accuracies of empirical statistics tend to improve with the number of trials.[1] This was then formalized as a law of large numbers. The LLN was first proved by Jacob Bernoulli.[2] It took him over 20 years to develop a sufficiently rigorous mathematical proof which was published in his Ars Conjectandi (The Art of Conjecturing) in 1713. He named this his "Golden Theorem" but it became generally known as "Bernoulli's Theorem". This should not be confused with the principle in physics with the same name, named after Jacob Bernoulli's nephew Daniel Bernoulli. In 1835, S.D. Poisson further described it under the name "La loi des grands nombres" ("The law of large numbers").[3] Thereafter, it was known under both names, but the "Law of large numbers" is most frequently used.

After Bernoulli and Poisson published their efforts, other mathematicians also contributed to refinement of the law, including Chebyshev, Markov, Borel, Cantelli and Kolmogorov. These further studies have given rise to two prominent forms of the LLN. One is called the "weak" law and the other the "strong" law. These forms do not describe different laws but instead refer to different ways of describing the mode of convergence of the cumulative sample means to the expected value, and the strong form implies the weak.

Forms

Both versions of the law state that -- with virtual certainty -- the sample average

\overline{X}_n=\frac1n(X_1+\cdots+X_n)

converges to the expected value

\overline{X}_n \, \to \, \mu \qquad\textrm{for}\qquad n \to \infty

where X1, X2, ... is an infinite sequence of i.i.d. random variables with finite expected value E(X1) = E(X2) = ... = µ < ∞.

An assumption of finite variance Var(X1) = Var(X2) = ... = σ2 < ∞ is not necessary. Large or infinite variance will make the convergence slower, but the LLN holds anyway. This assumption is often used because it makes the proofs easier and shorter.

The difference between the strong and the weak version is concerned with the mode of convergence being asserted. For interpretation of these modes, see Convergence of random variables.

Weak law

Simulation illustrating the Law of Large Numbers. Each frame, you flip a coin that is red on one side and blue on the other, and put a dot in the corresponding column. A pie chart shows the proportion of red and blue so far. Notice that the proportion varies a lot at first, but gradually approaches 50%.

The weak law of large numbers states that the sample average converges in probability towards the expected value [4] [proof]


    \overline{X}_n\ \xrightarrow{p}\ \mu \qquad\textrm{when}\ n \to \infty.

That is to say that for any positive number ε,


    \lim_{n\to\infty}\Pr\!\left(\,|\overline{X}_n-\mu| < \varepsilon\,\right) = 1.

Interpreting this result, the weak law essentially states that for any nonzero margin specified, no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value, that is, within the margin.

Convergence in probability is also called weak convergence of random variables. This version is called the weak law because random variables may converge weakly (in probability) as above without converging strongly (almost surely) as below.

Strong law

The strong law of large numbers states that the sample average converges almost surely to the expected value [5]


    \overline{X}_n\ \xrightarrow{a.s.}\ \mu \qquad\textrm{when}\ n \to \infty.

That is,


    \Pr\!\left( \lim_{n\to\infty}\overline{X}_n = \mu \right) = 1.

The proof is more complex than that of the weak law. This law justifies the intuitive interpretation of the expected value of a random variable as the “long-term average when sampling repeatedly”.

Almost sure convergence is also called strong convergence of random variables. This version is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). The strong law implies the weak law.

The strong law of large numbers can itself be seen as a special case of the pointwise ergodic theorem.

Moreover, if the summands are independent but not identically distributed, then


    \overline{X}_n - \operatorname{E}\big[\overline{X}_n\big]\ \xrightarrow{a.s.}\ 0

provided that each Xk has a finite second moment and


    \sum_{k=1}^{\infty} \frac{1}{k^2} \operatorname{Var}[X_k] < \infty.

This statement is known as Kolmogorov’s strong law, see e.g. Sen & Singer (1993, Theorem 2.3.10).

Differences between the weak law and the strong law

The weak law states that for a specified large n, the average \overline{X}_n is likely to be near μ. Thus, it leaves open the possibility that |\overline{X}_n -\mu| > \varepsilon happens an infinite number of times, although at infrequent intervals.

The strong law shows that this almost surely will not occur. In particular, it implies that with probability 1, we have that for any ε > 0 the inequality |\overline{X}_n -\mu| < \varepsilon holds for all large enough n.[6]

Uniform law of large numbers

Suppose f(x,θ) is some function defined for θ ∈ Θ, and continuous in θ. Then for any fixed θ, the sequence {f(X1,θ), f(X2,θ), …} will be a sequence of independent and identically distributed random variables, such that the sample mean of this sequence converges in probability to E[f(X,θ)]. This is the pointwise (in θ) convergence.

The uniform law of large numbers states the conditions under which the convergence happens uniformly in θ. If [7]

  1. Θ is compact,
  2. f(x,θ) is continuous at each θ ∈ Θ for almost all x’s,
  3. there exists a dominating function d(x) such that E[d(X)] < ∞, and
     \left\| f(x,\theta) \right\| \leq d(x) \quad\text{for all}\ \theta\in\Theta

Then E[f(X,θ)] is continuous in θ, and


    \sup_{\theta\in\Theta} \left\| \frac1n\sum_{i=1}^n f(X_i,\theta) - \operatorname{E}[f(X,\theta)] \right\|\ \xrightarrow{p}\ 0.

See also

Notes

  1. Mlodinow, L. The Drunkard's Walk. New York: Random House, 2008. p. 50.
  2. Jakob Bernoulli, Ars Conjectandi: Usum & Applicationem Praecedentis Doctrinae in Civilibus, Moralibus & Oeconomicis, 1713, Chapter 4, (Translated into English by Oscar Sheynin)
  3. Hacking, Ian. (1983) "19th-century Cracks in the Concept of Determinism"
  4. Loève 1977, Chapter 1.4, page 14
  5. Loève 1977, Chapter 17.3, page 251
  6. Ross (2009)
  7. Newey & McFadden 1994, Lemma 2.4

References

  • Grimmett, G. R. and Stirzaker, D. R. (1992). Probability and Random Processes, 2nd Edition. Clarendon Press, Oxford. ISBN 0-19-853665-8. 
  • Richard Durrett (1995). Probability: Theory and Examples, 2nd Edition. Duxbury Press. 
  • Martin Jacobsen (1992). Videregående Sandsynlighedsregning (Advanced Probability Theory) 3rd Edition. HCØ-tryk, Copenhagen. ISBN 87-91180-71-6. 
  • Loève, Michel (1977). Probability theory 1 (4th ed.). Springer Verlag. 
  • Newey, Whitney K.; McFadden, Daniel (1994). Large sample estimation and hypothesis testing. Handbook of econometrics, vol.IV, Ch.36. Elsevier Science. pp. 2111–2245. 
  • Ross, Sheldon (2009). A first course in probability (8th ed.). Prentice Hall press. ISBN 978-0136033134. 
  • Sen, P. K; Singer, J. M. (1993). Large sample methods in statistics. Chapman & Hall, Inc. 

External links